An atlas of healthy and injured cell states and niches in the human kidney

Understanding kidney disease relies on defining the complexity of cell types and states, their associated molecular profiles and interactions within tissue neighbourhoods1. Here we applied multiple single-cell and single-nucleus assays (>400,000 nuclei or cells) and spatial imaging technologies to a broad spectrum of healthy reference kidneys (45 donors) and diseased kidneys (48 patients). This has provided a high-resolution cellular atlas of 51 main cell types, which include rare and previously undescribed cell populations. The multi-omic approach provides detailed transcriptomic profiles, regulatory factors and spatial localizations spanning the entire kidney. We also define 28 cellular states across nephron segments and interstitium that were altered in kidney injury, encompassing cycling, adaptive (successful or maladaptive repair), transitioning and degenerative states. Molecular signatures permitted the localization of these states within injury neighbourhoods using spatial transcriptomics, while large-scale 3D imaging analysis (around 1.2 million neighbourhoods) provided corresponding linkages to active immune responses. These analyses defined biological pathways that are relevant to injury time-course and niches, including signatures underlying epithelial repair that predicted maladaptive states associated with a decline in kidney function. This integrated multimodal spatial cell atlas of healthy and diseased human kidneys represents a comprehensive benchmark of cellular states, neighbourhoods, outcome-associated signatures and publicly available interactive visualizations.


< Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section. The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly

xX! Xl
The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.

x!
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) x x X x For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable.
x For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes

xX OO
Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.

Software and code
Policy information about availability of computer code Data collection 10x Chromium v3 and Illumina Novaseq 6000 instrument control software (v1.6.0 and 1.7.0); Leica LASX software (v. 3.5); 3D label free autofluorescence and fluorescence imaging data were captured using a Leica SP8 confocal scan-head mounted to an upright DM6000 microscope. For large-scale imaging of tissues at submicron resolution, the Leica Tile Scan function was used to collect a mosaic of smaller image volumes using a high-power, high-numerical aperture objective. Leica LASX software (v. 3.5) was then used to stitch these component volumes into a single image volume of the entire sample. The scanner zoom and focus motor control were set to provide voxel dimensions of 0.5 x 0.5 um laterally and 1 um axially. 2D Immunofluorescence images and data were captured using Nikon EZ-C1 (3.91) confocal system and images produced using NIS-elements software (BR3.2 64 bit).
Data analysis Code to reproduce figures are available to download from github.com/KPMP/Cell-State-Atlas-2022.
snCv3 and scCv3 sample demultiplexing, barcode processing, and gene expression quantifications were performed with the 10X Cell Ranger v3 pipeline using the GRCh38 (hg38) or GRCh37 (hg19, indicated in Comments column of Supplementary

YD
Kaira code for analysis of chromatin data is provided at github.com/yanwu2014/chromfunks.
Slide-seq2 demultiplexing, genome alignment and spatial matching was performed using Slide-seq tools github.com/MacoskoLab/slideseqtools/releases/tag/0.1. Slide-seq analysis was performed using: For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Portfolio guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A description of any restrictions on data availability -For clinical datasets or third party data, please ensure that the statement adheres to our policy Processed data, interactive and visualization tools: The snCv3, scCv3, SNARE2, Slide-seq and Visium processed data files are all available for download from GEO (Superseries GSE183279). snCv3 healthy reference data is available for reference-based single cell mapping by the Azimuth tool: https:// azimuth.hubmapconsortium.org/. All snCv3 and scCv3 processed data can be accessed and viewed at cellxgene (https://cellxgene.cziscience.com/collections/ bcb61471-2a44-4d00-a0af-ff085512674c). snCv3 (excluding COVID-AKI and CKD nephrectomy samples), scCv3, Visium (KPMP biopsies) and 3D imaging can all be visualized and interrogated using the KPMP Data Atlas Explorer: https://atlas.kpmp.org/explorer/. For 3D imaging, the cytometry, cell classifications, gates and neighborhood analysis data are located at: https://doi.org/10.5281/zenodo.7120941.
Raw sequencing and imaging data: Raw sequencing data are under controlled access (human data) as they are potentially identifiable and can be accessed from the respective sources indicated below (summarized in Supplementary Table 1). Raw and processed sequencing and imaging data (snCv3, scCv3, 3D imaging, Slide-seq, Visium) generated as part of the Kidney Precision Medicine Project (KPMP) has been deposited at https://atlas.kpmp.org/repository/ and compiled at https:// doi.org/10.48698/3z31-8924. Raw sequencing data can be requested and are available by signing a data use agreement with KPMP. Raw sequencing data (snCv3, SNARE2, Slide-seq) generated as part of the Human Biomolecular Atlas Project (HuBMAP) has been deposited at https://portal.hubmapconsortium.org/ and compiled at https://doi.org/10.35079/hbm776.rgsw.867. The HUBMAP raw data are available for download from the database of Genotypes and Phenotypes (dbGaP, phs002249). snCv3 data not deposited to KPMP or HUBMAP are available from GEO (GSE183279) or, for Covid AKI raw sequencing files, upon request from WU KTRC (sanjayjain@wustl.edu) due to patient confidentiality.

Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.
Life sciences For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Life sciences study design
All studies must disclose on these points even when the disclosure is negative.
Sample size Sample sizes were not predetermined by statistical methods due to nature of this study. The strength lies in the number of individuals analyzed, technologies represented for orthogonal validation and cells analyzed (more than any existing study for the kidney). For snCv3 (n = 36), scCv3 (n = 45), SNARE2 (n = 7), 3D imaging (n = 15), 10X Visium (n = 22) and Slide-seq (n = 6) single nuclei, single cells or tissue sections were obtained from living or deceased donor tissues ("'n" here refers to individuals, the number of independent samples is explained in detail in the "Replication" section below). These were obtained from healthy reference, AKI or CKD individuals. To ensure robust cell state profiles, reference tissues were obtained from multiple sources, and biopsies were collected from AKI and CKD patients under rigorous quality assurance and control procedures. This ensured that cell type clusters were not driven by technical artifacts and that our analyses showed rigor and reproducibility.

Replication Randomization
Blinding Behaviou Low quality cells or nuclei were excluded from analyses based on established quality filtering metrics:

SNARE2 -AC:
Cell barcodes not passing RNA QC filters < 0.15 tss enrichment < 1000 read fragments or 500 UMI per cell < 0.15 of read fragments overlapping promoter regions samples showing < 500 dual omic cells after quality filtering Gene/UMI ratio filter (Pagoda2) Visium 10x: In each Visium sample, spots were eliminated if they did not overly tissue. In addition, the outermost layer of spots was eliminated from comparative analyses if the edge was manually cut by a razor.
RNA-Seq: snCv3 data was generated from 44 independent samples or experiments to cover 36 individuals, scCv3 was generated from 49 samples covering 45 individuals, and SNARE2 was generated from 17 samples covering 7 individuals. snCv3 clustering analysis was performed at multiple k values and cluster assignments were performed using a defined process (see Methods). Reproducibility of assigned cell type annotations was evident from consistent aligned populations found across technologies (scCv3, SNARE, Slide-seq, Visium) and high correlation values with reference (published) data sets.
Imaging: For 3D imaging and immunofluorescence staining experiments, each staining was repeated on at least 2 separate individuals or separate regions. For ISH, each stain was performed on 6 separate individuals. For Visium spatial transcriptomics, 23 samples from 22 individuals were included in the analysis. These included at least 6 samples from each of the reference, CKD, and AKI categories. For Slide-seq we generated 31 cortical and 36 medullary pucks from 6 individuals. For immunofluorescence validation studies, commercially available antibodies were used; the immunostaining included tissue from patients not contributing to omics data. Similarly, orthogonal validation of omics annotations and spatial localization in Visium studies also included more than four samples each from reference and disease biopsies that were not used to generate single cell gene expression data. This heterogeneity in sampling demonstrated the reproducibility and rigor of the atlas. All attempts at replication were successful for these imaging experiments.
Further, several technologies were performed on samples from the same individual and in some cases the same tissue block was used to generate multimodal data.
Randomization was not used as it was not relevant for this study design as healthy and disease samples were obtained as available.
Generation of data and processed files were agnostic to the disease conditions. Batch effects were corrected by scaling expression of each gene to the dataset-wide average and shown to have minimal effect from cell type or cluster contribution plots.
All human specimens used in this study were de-identified, however select attributes (condition, age, sex) were available to all investigators. A majority of the analyses were not performed blind as these sample attributes were needed for accurate annotation of cell types or states and for the design of downstream analyses to create maps.

ral & social sciences study design
All studies must disclose on these points even when the disclosure is negative.

Study description
Research sample Sampling strategy or audio equipment) whether anyone was present ides the participant(s) and the researcher, and Data collection 1s blind to experimental condition and/or the study hypothesis during data collection. . "A ; . o y ie) Data exclusions if no data were excluded from the analyses, state so OR if data were excluded, provide the exact nurnber of exclusions and the = c k 7 . .
rationale behind them, indicating whether exclusion criteria were pre-established. Q

e)
Non-participation State how many participants dropped out/declined participation and the reason(s) given OR provide response rate OR state that no arr participants dropped out/declined participation. ay xe) . " Randomization if participants were not allocated into experimental groups, state so OR describe how participants were allocated to groups, and if = allocation was not random, describe how covariates were controlled. ie Ecological, evolutionary & environmental sciences study design E All studies must disclose on these points even when the disclosure is negative. Sampling strategy Note the sampling procedure. Describe the statistical methods that were used to predetermine sample size OR if no sample-size calculation was performed, describe how sample sizes were chosen and provide a rationale for why these sample sizes are sufficient.

Data collection
Describe the dato collection procedure, including who recorded the dato and how Timing and spatial scale = /ndicate the start and stop dotes of data collection, neting the frequency and periodicity of sampling and providing a rationale for If there is a gap between collection periods, state the dates for each sample cohort. Specify the spatial scale from which the data are taken 5, state so OR if data were ided, describe the exclusions and the rationale behind them, ablished.

Data exclusions
If no data were excluded from the analys indicating whether exclusion criteria were pre e

Reproducibility
Describe the measures taken to verify the reproducibility of experimental findings. For each experiment, note whether any attempts to epeat the experiment failed OR state that all attempts to repeat the experiment were successful.

Randomization
Describe how samples/organisms/participants were allocated into groups. If allocation was not random, describe how covariates were controlled. tf this is not relevant to your study, explain why.
'plain why scribe the extent of blinding used during data acquisition and analysis. If blinding was not possible, describe why OR « Blinding De blinding was not relevant tu your study.
Did the study involve field work?

[_| Yes No
Field work, collection and transport emperature, rainfall). i ndition Describe the study conditions for field work, providing relevant purameters (e. S J g k le the location of the sampling or experiment, providing relevant parameters (e.g. latitude and longitude, elevation, water depth).
ponsible manner and in port your samples ina re ss habitats and to collect and impo Access & import/export Describe the efforts you have made to ac compliance with local, national and international laws, noting any permits that were obtained (give the name of the issuing authority, the date of issue, and any identifying information).
caused by the study and how it was minimized. any disturban

Disturbance Descr
Sy a ; Reporting for specific materials, systems and methods We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response.  MPO -https://www.abcam.com/myeloperoxidase-antibody-ab9535.html DyLight-594 -https://www.thermofisher.com/antibody/product/Donkey-anti-Rabbit-IgG-H-L-Cross-Adsorbed-Secondary-Antibody-Polyclonal/SA5-10040 CD68 -https://www.agilent.com/en/product/immunohistochemistry/antibodies-controls/primary-antibodies/cd68-%28concentrate
(See ICLAC register) Palaeontology and Archaeology venance information for specimens and describe permit Specimen provenance Provide p issuing authority, the date of issue, and any identifying information). Permits

export,
Specimen deposition Indicate where Note that full information on the approval of the study protocol must also be provided in the manuscript.

Animals and other organisms
Policy information about studies involving animals; ARRIVE guidelines recommended for reporting animal research Laboratory animals For laboratory animals, report species, strain, sex and age OR state that the study did not involve laboratory Note that full information on the approval of the study protocol must also be provided in the manuscript.

Human research participants
Policy information about studies involving human research participants

Population characteristics
The population used here were adults in the age interval 20-80 and included both sexes and participants of different races.
The associated clinical metadata includes age, sex, race, comorbidities, eGFR, certain medications and is detailed in supplemental table 3. The clinical conditions include AKI and CKD.

Recruitment
Participants were recruited from different sites and IRB approval was obtained for use of tissue and data for research ina deidentifiable manner. To obtain consent, the coordinators would approach the participant after consultations with the clinical team, go over the study with them, address any questions and concerns. Once consent was obtained, samples are procured and preserved in a timely manner using standardized protocols that have been published and available on KPMP.org. Recruitment of AKI and CKD patients were per established clinical criteria (https://www.kpmp.org/for-clinicians).
The reference tissue samples were selected from patients with normal kidney function and/or age appropriate histopathology as they became available. Samples under waived consent are described in the ethics statement. The associated clinical and pathological data is provided in Supplemental Table3 for readers to interpret the study results.

Ethics oversight
We have complied with all ethical regulations related to this study. Human samples (Supplementary Table 1

= =
Informed consent was obtained for the use of data and samples for all participants at Washington University, including living patients undergoing partial or total nephrectomy or from discarded deceased kidney donors. Cortical and papillary biopsy samples from patients with stone disease were obtained with informed consent from Indiana University and approved by the Indiana University Institutional Review Board (IRB #1010002261 Note that full information on the approval of the study protocol must also be provided in the manuscript.

Clinical data
Policy information about clinical studies All manuscripts should comply with the ICMJE guidelines for publication of clinical research and a completed CONSORT checklist must be included with all submissions.

Study protocol
Note where the full trial protocol can be accesse explain WAY.
f data collection, noting the time periods of recruitment and dota collection.

Data collection
Describe the settings ¢

Outcomes
Describe how you pre-defined primary and secondary outcome measures and how you assessed these measure

Dual use research of concern
Policy information about dual use research of concern

Hazards
Could the accidental, deliberate or reckless misuse of agents or technologies generated in the work, or the application of information presented in the manuscript, pose a threat to:

Experiments of concern
Does the work involve any of these experiments of concern: Demonstrate how to render a vaccine ineffective Confer resistance to therapeutically useful antibiotics or antiviral agents Enhance the virulence of a pathogen or render a nonpathogen virulent

Increase transmissibility of a pathogen
Alter the host range of a pathogen Enable evasion of diagnostic/detection modalities Enable the weaponization of a biological agent or toxin

RRKRRRMMS OOOOoCoOoos
Any other potentially harmful combination of experiments and agents

ChIP-seq
Data deposition [| Confirm that both raw and final processed data have been deposited in a public database such as GEO.
[| Confirm that you have deposited or provided access to graph files (e.g. BED files) for the called peaks. {for read mapping and peak calling, including the ChIP, control and index files Peak calling parameters Specify the command line prograrn and parameters e the methods used to ensure data quality in full detail, including how many peaks are at FOR 5% and above 5-fold enrichment.
Data quality he software used to collect and analyze the ChiP-seq data. For custom code that has been deposited into a community Software ry, provide accession details Flow Cytometry

Plots
Confirm that: | The axis labels state the marker and fluorochrome used (e.g. CD4-FITC).
[| The axis scales are clearly visible. Include numbers along axes only for bottom left plot of group (a 'group' is an analysis of identical markers).
| All plots are contour plots with outliers or pseudocolor plots.
[| A numerical value for number of cells or percentage (with statistics) is provided.
Methodology source of the cells and any tissue processing steps used, Sample preparation Describe the sample preparation, detailing the bialoc , specifying make and model number Instrument Identify the instrument used for data colle code that has been deposited into a vare used to collect and anal! Software Describe the sof low cytometry data.
transformation OR indicate that data were not normatized re normalized/standardized, describe the approach(es): specify linear or non-linear and define image types used for and explain rationale for lack of normatizatior Normalization template Describe the template used for normalization/transformation, specifying subject space or group standardized s original Talairach, MNI305, ICBM152) OR indicate that the data we ere Not normalized.
Noise and artifact removal Describe your procedure(s) for artifact and structured noise removal, specifying motion parameters, tissue signals and Is (heart re physiological signe e, respiration) Volume censoring Define your software and/or method and criteria for volu censoring, and state the extent of such censoring.

Statistical modeling & inference
(mass univariate, multivariate, RSA, predictive, etc.) and describe essential details of the model at the first and Model type and settings second levels (e.g. fixed, random or mixed effects; drift or auto-correlation).
Effect ( Multivariate modeling and predictive analysis